πŸ•·οΈοΈ Job Radar β€’ SCRAPING

Live freelance tracking. Raw descriptions turned into structured data. Find your next tech project without the noise.

upwork.com 🟒 2026-05-18

πŸ”Ή Process and analyze a large database of Indian names
πŸ‘€ Client: πŸ‡ΊπŸ‡Έ USA Member since 2024-11-07
πŸ’° Price: $5.00-$60.00 Hourly
🚩 Problem: Need to clean, preprocess, and analyze a massive dataset of Indian names for demographic insights.
πŸ“¦ Existing: Not specified

Specifications:

[Target] Clean and pre-process raw .csv files with OCR and Unicode artifacts
[Method] Use Python (pandas/polars, regex, Unicode text handling)
[UI/UX] Not applicable
[Stack] Python, pandas, polars, regex, IndicXlit/AI4Bharat
[Security] Ensure data privacy and security during processing
[Format] Output structured JSON for further analysis

Workflow:

1. Import raw .csv files into a DataFrame using pandas or polars.
2. Handle OCR and Unicode artifacts by cleaning text data.
3. Transliterating names from Devanagari, Gurmukhi to Latin script using IndicXlit/AI4Bharat.
4. Normalize and standardize names according to pre-specified rules.
5. Extract personal names and surnames from full and parental name fields using rule-based parsing.
6. Implement existing ML algorithms for classifying and inferring religion based on names.
7. Construct frequency-based measures across age cohorts and geographies.
8. Output results in structured JSON format.

⚑ Receive notifications instantly Join our community.